Clustering by Left-Stochastic Matrix Factorization

نویسندگان

  • Raman Arora
  • Maya R. Gupta
  • Amol Kapila
  • Maryam Fazel
چکیده

We propose clustering samples given their pairwise similarities by factorizing the similarity matrix into the product of a cluster probability matrix and its transpose. We propose a rotation-based algorithm to compute this left-stochastic decomposition (LSD). Theoretical results link the LSD clustering method to a soft kernel k-means clustering, give conditions for when the factorization and clustering are unique, and provide error bounds. Experimental results on simulated and real similarity datasets show that the proposed method reliably provides accurate clusterings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity-based clustering by left-stochastic matrix factorization

For similarity-based clustering, we propose modeling the entries of a given similarity matrix as the inner products of the unknown cluster probabilities. To estimate the cluster probabilities from the given similarity matrix, we introduce a left-stochastic non-negative matrix factorization problem. A rotation-based algorithm is proposed for the matrix factorization. Conditions for unique matrix...

متن کامل

A Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem.  At each step of ALS algorithms two convex least square problems should be solved, which causes high com...

متن کامل

Consistent parameter estimation in general stochastic block models with overlaps

This paper considers the parameter estimation problem in Stochastic Block Model with Overlaps (SBMO), which is a quite general instance of random graph model allowing for overlapping community structure. We present the new algorithm successive projection overlapping clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factori...

متن کامل

Low-Rank Doubly Stochastic Matrix Decomposition for Cluster Analysis

Cluster analysis by nonnegative low-rank approximations has experienced a remarkable progress in the past decade. However, the majority of such approximation approaches are still restricted to nonnegative matrix factorization (NMF) and suffer from the following two drawbacks: 1) they are unable to produce balanced partitions for large-scale manifold data which are common in real-world clusterin...

متن کامل

Ensemble Non-negative Matrix Factorization for Clustering Biomedical Documents

Searching and mining biomedical literature database, such as MEDLINE, is the main source of generating scientific hypothesis for biomedical researchers. Through grouping similar documents together, clustering techniques can facilitate user’s need of effectively finding interested documents. Since non-negative matrix factorization (NMF) can effectively capture the latent semantic space with non-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011